Automating Coreference: The Role of Annotated Training Data

نویسندگان

  • Lynette Hirschman
  • Patricia Robinson
  • John D. Burger
  • Marc B. Vilain
چکیده

We report here on a study of interannotator agreement in the coreference task as defined by the Message Understanding Conference (MUC-6 and MUC-7). Based on feedback from annotators, we clarified and simplified the annotation specification. We then performed an analysis of disagreement among several annotators, concluding that only 16% of the disagreements represented genuine disagreement about coreference; the remainder of the cases were mostly typographical errors or omissions, easily reconciled. Initially, we measured interannotator agreement in the low 80’s for precision and recall. To try to improve upon this, we ran several experiments. In our final experiment, we separated the tagging of candidate noun phrases from the linking of actual coreferring expressions. This method shows promise -interannotator agreement climbed to the low 90s -but it needs more extensive validation. These results position the research community o broaden the coreference task to multiple languages, and possibly to different kinds of coreference.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Competitive Self-Trained Pronoun Interpretation

We describe a system for pronoun interpretation that is self-trained from raw data, that is, using no annotated training data. The result outperforms a Hobbsian baseline algorithm and is only marginally inferior to an essentially identical, state-of-the-art supervised model trained from a substantial manually-annotated coreference corpus.

متن کامل

Coping With Implicit Arguments And Events Coreference

In this paper we present ongoing work for the creation of a linguistically-based system for event coreference. We assume that this task requires deep understanding of text and that statistically-based methods, both supervised and unsupervised are inadequate. The reason for this choice is due to the fact that event coreference can only take place whenever argumenthood is properly computed. It is...

متن کامل

SUC-CORE: A Balanced Corpus Annotated with Noun Phrase Coreference

This paper describes SUC-CORE, a subset of the Stockholm Umeå Corpus and the Swedish Treebank annotated with noun phrase coreference. While most coreference annotated corpora consist of texts of similar types within related domains, SUC-CORE consists of both informative and imaginative prose and covers a wide range of literary genres and domains. This allows for exploration of coreference acros...

متن کامل

Extracting Bacteria Biotopes with Semi-supervised Named Entity Recognition and Coreference Resolution

This paper describes our event extraction system that participated in the bacteria biotopes task in BioNLP Shared Task 2011. The system performs semi-supervised named entity recognition by leveraging additional information derived from external resources including a large amount of raw text. We also perform coreference resolution to deal with events having a large textual scope, which may span ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9803001  شماره 

صفحات  -

تاریخ انتشار 1998